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Stanford University, Yahoo! Research and Stanford University 

Classical multidimensional scaling (MDS) is a method for visual- 
izing high-dimensional point clouds by mapping to low-dimensional 
Euclidean space. This mapping is defined in terms of eigenfunctions 
of a matrix of interpoint dissimilarities. In this paper we analyze in 
detail multidimensional scaling applied to a specific dataset: the 2005 
United States House of Representatives roll call votes. Certain MDS 
and kernel projections output "horseshoes" that are characteristic of 
dimensionality reduction techniques. We show that, in general, a la- 
tent ordering of the data gives rise to these patterns when one only 
has local information. That is, when only the interpoint distances for 
nearby points are known accurately. Our results provide a rigorous 
set of results and insight into manifold learning in the special case 
where the manifold is a curve. 

1. Introduction. Classical multidimensional scaling is a widely used tech- 
nique for dimensionality reduction in complex data sets, a central problem in 
pattern recognition and machine learning. In this paper we carefully analyze 
the output of MDS applied to the 2005 United States House of Represen- 
tatives roll call votes [Office of the Clerk — U.S. House of Representatives 
(2005)]. The results we find seem stable over recent years. The resultant 
3-dimensional mapping of legislators shows "horseshoes" that are character- 
istic of a number of dimensionality reduction techniques, including principal 
components analysis and correspondence analysis. These patterns are heuris- 
tically attributed to a latent ordering of the data, for example, the ranking 
of politicians within a left-right spectrum. Our work lends insight into this 
heuristic, and we present a rigorous analysis of the "horseshoe phenomenon." 
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Seriation in archaeology was the main motivation behind D. Kendall's 
discovery of this phenomenon [Kendall (1970)]. Ordination techniques are 
part of the ecologists' standard toolbox [ter Braak (1985, 1987), 
Wartenberg, Ferson and Rohlf (1987)]. There are hundreds of examples of 
horseshoes occurring in real statistical applications. For instance, 
Dufrene and Legendre (1991) found that when they analyzed the available 
potential ecological factors scored in the form of presence/absence in 10 km 
side squares in Belgium there was a strong underlying gradient in the data set 
which induced "an extraordinary horseshoe effect." This gradient followed 
closely the altitude component. Mike Palmer has a wonderful "ordination 
website" where he shows an example of a contingency table crossing species 
counts in different locations around Boomer Lake [Palmer (2008)]. He shows 
a horseshoe effect where the gradient is the distance to the water (Palmer). 
Psychologists encountered the same phenomenon and call it the Guttman ef- 
fect after Guttman (1968). Standard texts such as Mardia, Kent and Bibby 
(1979), page 412, claim horseshoes result from ordered data in which only 
local interpoint distances can be estimated accurately. The mathematical 
analysis we provide shows that by using the exponential kernel, any dis- 
tance can be downweighted for points that are far apart and also provide 
such horseshoes. 

Methods for accounting for [ter Braak and Prentice (1988)], or removing 
gradients [Hill and Gauch (1980)], that is, detrending the axes, are standard 
in the analysis of MDS with chisquare distances, known as correspondence 
analysis. 

Some mathematical insights into the horseshoe phenomenon have been 
proposed [Podani and Miklos (2002), Iwatsubo (1984)]. 

The paper is structured as follows: In Section 1.1 we describe our data 
set and briefly discuss the output of MDS applied to these data. Section 1.2 
describes the MDS method in detail. Section 2 states our main assumption — 
that legislators can be isometrically mapped into an interval — and presents 
a simple model for voting that is consistent with this metric requirement. In 
Section 3 we analyze the model and present the main results of the paper. 
Section 4 connects the model back to the data. The proofs of the theoretical 
results from Section 3 are presented in the Appendix. 

1.1. The voting data. We apply multidimensional scaling to data gener- 
ated by members of the 2005 United States House of Representatives, with 
similarity between legislators defined via roll call votes (Office of the Clerk — 
U.S. House of Representatives). A full House consists of 435 members, and in 
2005 there were 671 roll calls. The first two roll calls were a call of the House 
by States and the election of the Speaker, and so were excluded from our 
analysis. Hence, the data can be ordered into a 435 x 669 matrix D = (dij) 
with dij £ {1/2,-1/2,0} indicating, respectively, a vote of "yea," "nay," or 
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"not voting" by Representative i on roll call j. (Technically, a representa- 
tive can vote "present," but for purposes of our analysis this was treated as 
equivalent to "not voting.") We further restricted our analysis to the 401 
Representatives that voted on at least 90% of the roll calls (220 Republi- 
cans, 180 Democrats and 1 Independent), leading to a 401 x 669 matrix V of 
voting data. This step removed, for example, the Speaker of House Dennis 
Hastert (R-IL) who by custom votes only when his vote would be decisive, 
and Robert T. Matsui (D-CA) who passed away at the start the term. 
As a first step, we define an empirical distance between legislators as 

^ 669 

(1-1) d(k,lj) = — Y^\vik-Vjk\- 

Roughly, d(li,lj) is the percentage of roll calls on which legislators li and 
lj disagreed. This interpretation would be exact if not for the possibility 
of "not voting." In Section 2 we give some theoretical justification for this 
choice of distance, but it is nonetheless a natural metric on these data. 

Now, it is reasonable that the empirical distance above captures the sim- 
ilarity of nearby legislators. To reflect the fact that d is most meaningful at 
small scales, we define the proximity 

P(i,j) = l-exp(-d(k,lj)). 

Then P(i,j) ~d{li,lj) for d{U,lj) -C 1 and P(i,j) is not as sensitive to noise 
around relatively large values of d{li This localization is a common fea- 
ture of dimensionality reduction algorithms, for example, eigenmap [Niyogi 
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Fig. 1. 3-Dvmensional MDS output of legislators based on the 2005 U.S. House roll call 
votes. Color has been added to indicate the party affiliation of each Representative. 
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(2003)], isomap [Tenenbaum, de Silva and Langford (2000)], local linear em- 
bedding [Roweis and Saul (2000)] and kernel PCA [Scholkopf, Smola and Muller 



We apply MDS by double centering the squared distances built from the 
dissimilarity matrix P and plotting the first three eigenfunctions weighted 
by their eigenvalues (see Section 1.2 for details). Figure 1 shows the results of 
the 3-dimensional MDS mapping. The most striking feature of the mapping 
is that the data separate into "twin horseshoes." We have added color to 
indicate the political party affiliation of each Representative (blue for Demo- 
crat, red for Republican and green for the lone independent — Rep. Bernie 
Sanders of Vermont). The output from MDS is qualitatively similar to that 
obtained from other dimensionality reduction techniques, such as principal 
components analysis applied directly to the voting matrix V . 

In Sections 2 and 3 we build and analyze a model for the data in an 
effort to understand and interpret these pictures. Roughly, our theory pre- 
dicts that the Democrats, for example, are ordered along the blue curve in 
correspondence to their political ideology, that is, how far they lean to the 
left. In Section 4 we discuss connections between the theory and the data. In 
particular, we explain why in the data legislators at the political extremes 
are not quite at the tips of the projected curves, but rather are positioned 
slightly toward the center. 

1.2. Multidimensional scaling. Multidimensional Scaling (MDS) is a wide- 
ly used technique for approximating the interpoint distances, or dissimilari- 
ties, of points in a high-dimensional space by actual distances between points 
in a low-dimensional Euclidean space. See Young and Householder (1938) 
and Torgerson (1952) for early, clear references, Shepard (1962) for ex- 
tensions from distances to ranked similarities, and Mardia, Kent and Bibby 
(1979), Cox and Cox (2000) and Borg and Groenen (1997) for useful text- 
book accounts. In our setting, applying the usual centering operations of 
MDS to the proximities we use as data lead to surprising numerical coinci- 
dences: the eigenfunctions of the centered matrices are remarkably close to 
the eigenfunctions of the original proximity matrix. The development below 
unravels this finding, and describes the multidimensional scaling procedure 
in detail. 

Euclidean points: If x±, X2, ■ ■ ■ , x n G W, let 



be the interpoint distance matrix. Schoenberg [Schoenberg (1935)] charac- 
terized distance matrices and gave an algorithmic solution for finding the 
points given the distances (see below). Albouy (2004) discusses the history 
of this problem, tracing it back to Borchardt (1866). Of course, the points 
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can only be reconstructed up to translation and rotation, thus, we assume 
Er=i4 = for all k. 

To describe Schoenberg's procedure, first organize the unknown points 
into a n x p matrix X and consider the matrix of dot products S = XX T , 
that is, Sij = XixJ . Then the spectral theorem for symmetric matrices yields 
S = UAU T for orthogonal U and diagonal A. Thus, a set of n vectors which 
yield S is given by X = UK 1 / 2 . Of course, we can only retrieve X up to an 
orthonormal transformation. This reduces the problem to finding the dot 
product matrix S from the interpoint distances. For this, observe 

2 / \ / \ ^Z"" 1 ^Z -1 ^Z" 1 r J^ 

j — <yX>i XjjyXj^ Xjj — X^X^ I XjXj ^iX^Xj 

or 

(1.2) D 2 = sl T + ls T - 25, 

where D 2 is the n x n matrix of squared distances, s is the n x 1 vector of 
the diagonal entries of S, and 1 is the n x 1 vector of ones. The matrix S 
can be obtained by double centering D2: 

(1.3) S = -\HD 2 H, H = I--11 T . 

To see this, first note that, for any matrix A, HAH centers the rows and 
columns to have mean 0. Consequently, Hsl T H = Hls T H = since the 
rows of sl T and the columns of ls T are constant. Pre- and post-multiplying 
(1.2) by H, we have 

HD 2 H = -2HSH. 
Since the x's were chosen as centered, X T 1 = 0, the row sums of 5 satisfy 

X] XiX j = Xi \ X] x i 
j \ 3 

and so S = — \HD 2 H as claimed. 

In summary, given annxn matrix of interpoint distances, one can solve 
for points achieving these distances by the following: 

1. Double centering the interpoint distance squared matrix: S = — \HD 2 H. 

2. Diagonalizing S: S = UAU T . 

3. Extracting X: X = UK 1 / 2 . 

Approximate distance matrices: The analysis above assumes that one 
starts with points x\, x 2 , . . . , x n in a p-dimensional Euclidean space. We 
may want to find an embedding Xi => yi in a space of dimension k < p 
that preserves the interpoint distances as closely as possible. Assume that 
S = U AU T is such that the diagonal entries of A are decreasing. Set to be 
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the matrix obtained by taking the first k columns of the U and scaling them 
so that their squared norms are equal to the eigenvalues In particular, 
this provides the first k columns of X above and solves the minimization 
problem 

(1.4) min V(||xi -XjUl - \\yi -yj\\%). 

Young and Householder (1938) showed that this minimization can be real- 
ized as an eigenvalue problem; see the proof in this context in 
Mardia, Kent and Bibby (1979), page 407. In applications, an observed ma- 
trix D is often not based on Euclidean distances (but may represent "dis- 
similarities," or just the difference of ranks). Then, the MDS solution is a 
heuristic for finding points in a Euclidean space whose interpoint distances 
approximate the orders of the dissimilarities D. This is called nonmetric 
MDS [Shepard (1962)]. 

Kernel methods: MDS converts similarities into inner products, whereas 
modern kernel methods [Scholkopf, Smola and Muller (1998)] start with a 
given matrix of inner products. Williams (2000) pointed out that Kernel 
PCA [Scholkopf, Smola and Muller (1998)] is equivalent to metric MDS in 
feature space when the kernel function is chosen isotropic, that is, the kernel 
K{x,y) only depends on the norm \\x — y\\. The kernels we focus on in this 
paper have that property. We will show a decomposition of the horseshoe 
phenomenon for one particular isotropic kernel, the one defined by the kernel 
function k(xi,Xj) = exp(— 6>(xj — Xj)'(xi — Xj)). 

Relating the eigenf unctions of S to those of Di\ In practice, it is easier 
to think about the eigenfunctions of the squared distances matrix D2 rather 
than the recentered matrix S = — \KD2ii. 

Observe that if v is any vector such that l T v = (i.e., the entries of v 
sum to 0) , then 

Hv= (l-^ll T \v = v. 
Now, suppose w is an eigenfunction of D2 with eigenvalue A, and let 

«=(i|>)i 

be the constant vector whose entries are the mean of w. Then l T (w — w) = 
and 

S(w — w) = —-HD2H(w — w) 
= --HD2(w - w) 
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= — -H(Xw — Xw + Xw — D2W) 

= -^-™) + ^|>) 

where = Xw=i(-^2)ij and f = (1/n) r i- ^ n short, if w is an eigenfunc- 
tion of D2 and w = 0, then u> is also an eigenfunction of S. By continuity, if 
w ~ or rj ~ f, then u; — u; is an approximate eigenfunction of S. In our set- 
ting, it turns out that the matrix D2 has approximately constant row sums 
(so rj~f), and its eigenfunctions satisfy wkO (in fact, some satisfy w = 0). 
Consequently, the eigenfunctions of the centered and uncentered matrix are 
approximately the same in our case. 

2. A model for the data. We begin with a brief review of models for 
this type of data. In spatial models of roll call voting, legislators and poli- 
cies are represented by points in a low-dimensional Euclidean space with 
votes decided by maximizing a deterministic or stochastic utility function 
(each legislator choosing the policy maximizing their utility). For a precise 
description of these techniques, see de Leeuw (2005), where he treats the 
particular case of roll call data such as ours. 

Since Coombs (1964), it has been understood that there is usually a natu- 
ral left-right (i.e., unidimensional) model for political data. Recent compar- 
isons [Burden, Caldeira and Groseclose (2000)] between the available left- 
right indices have shown that there is little difference, and that indices 
based on multidimensional scaling [Heckman and Snyder (1997)] perform 
well. Further, Heckman and Snyder (1997) conclude "standard roll call mea- 
sures are good proxies of personal ideology and are still among the best 
measures available." 

In empirical work it is often convenient to specify a parametric family of 
utility functions. In that context, the central problem is then to estimate 
those parameters and to find "ideal points" for both the legislators and 
the policies. A robust Bayesian procedure for parameter estimation in spa- 
tial models of roll call data was introduced in Clinton, Jackman and Rivers 
(2004), and provides a statistical framework for testing models of legislative 
behavior. 

Our cut-point model is a bit different and is explained next. Although 
the empirical distance (1.1) is arguably a natural one to use on our data, 
we further motivate this choice by considering a theoretical model in which 
legislators lie on a regular grid in a unidimensional policy space. In this 
idealized model it is natural to identify legislators li 1 < i < n with points in 
the interval / = [0, 1] in correspondence with their political ideologies. We 
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define the distance between legislators to be 

d{li,lj) — \li Ij I • 

This assumption that legislators can be isometrically mapped into an interval 
is key to our analysis. In the "cut-point model" for voting, each bill 1 < k < m 
on which the legislators vote is represented as a pair 

(C^Pk) €[0,1] x {0,1}. 

We can think of P k as indicating whether the bill is liberal (P& = 0) or 
conservative (P& = 1), and we can take C k to be the cut-point between 
legislators that vote "yea" or "nay." Let Vik € {1/2,-1/2} indicate how 
legislator Zj votes on bill k. Then, in this model, 

rr _ Jl/2-Pfc, k<C k , 

Vik -\P k -l/2, h>C k . 

As described, the model has n + 2m parameters, one for each legislator 
and two for each bill. These parameters are not identifiable without further 
restrictions. Adding e to li and C k results in the same votes. Below we fix 
this problem by specifying values for Z, and a distribution on {C&}. 

We reduce the number of parameters by assuming that the cut-points are 
independent random variables uniform on /. Then, 

(2.1) nV ik ^V jk ) = d(l i ,l j ), 

since legislators li and L take opposites sides on a given bill if and only if 
the cut-point C k divides them. Observe that the parameters P k do not affect 
the probability above. 

The empirical distance (1.1) between legislators li and lj generalizes to 

m -y m 

d m (kJj) = — \ V ik ~ V jk\ = — Yj l Vik^V ]k - 
" l k=l ,U k=l 

By (2.1), we can estimate the latent distance d between legislators by the 
empirical distance d which is computable from the voting record. In partic- 
ular, 

lim d m (li,lj) = d(li,L) a.s., 

m— >oo J J 

since we assumed the cut-points are independent. More precisely, we have 
the following result: 

Lemma 2.1. For m > log (n/ v / e)/e 2 , 

V(\dm(.k,lj) ~ d(k,lj)\ < e VI < i, j < n) > 1 - e. 
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Proof. By the Hoeffding inequality, for fixed l{ and lj, 

n\d m (h,lj) - d(k,l,)\ >e)< 2e~ 2me \ 

Consequently, 

P( lj \dm(k,lj)-d(k,l j )\>e)< n\dm(k,l j )-d(l h l j )\>e) 

\l<i<j<n ) l<i<j<n 

<e 

for m > log(n/y / e : )/e 2 , and the result follows. □ 

We identify legislators with points in the interval I = [0, 1] and define the 
distances between them to be d(li = \h — This general description 
seems to be reasonable not only for applications in political science, but 
also in a number of other settings. The points and the exact distance d are 
usually unknown, however, one can often estimate d from the data. For our 
work, we assume that one has access to an empirical distance that is locally 
accurate, that is, we assume one can estimate the distance between nearby 
points. 

To complete the description of the model, something must be said about 
the hypothetical legislator points ij. In Section 3 we specify these so that 
d(li,lj) = \i/n — j/n\. Because of the uniformity assumption on the bill pa- 
rameters and Lemma 2.1, aspects of the combination of assumptions can be 
empirically tested. A series of comparisons between model and data (along 
with scientific conclusions) are given in Section 4. These show rough but 
good accord; see, in particular, the comparison between Figures 3, 6, 7 and 
Figure 9 and the accompanying commentary. 

Our model is a simple, natural set of assumptions which lead to a use- 
ful analysis of these data. The assumptions of uniform distribution of bills 
implies identifiability of distances between legislators. Equal spacing is the 
mathematically simplest assumption matching the observed distances. In in- 
formal work we have tried varying these assumptions but did not find these 
variations led to a better understanding of the data. 

3. Analysis of the model. 

3.1. Eigenf unctions and horseshoes. In this section we analyze multidi- 
mensional scaling applied to metric models satisfying 

d(xi,Xj) = \i/n- j/n\. 
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This corresponds to the case in which legislators are uniformly spaced in /: 
li = i/n. Now, if all the interpoint distances were known precisely, classical 
scaling would reconstruct the points exactly (up to a reversal of direction). 
In applications, it is often not possible to have globally accurate informa- 
tion. Rather, one can only reasonably approximate the interpoint distances 
for nearby points. To reflect this limited knowledge, we work with the dis- 
similarity 

1 — exp(— d(xi, Xj)). 

e -l/n i_ e -(»-i)/n\ 
'-• : 

1 _ e -l/n 

1-e" 1 / 11 / 

We are interested in finding eigenfunctions for the doubly centered matrix 

S = -\HPH = -\{P - JP - PJ + JPJ), 

where J = (l/n)ll T . To prove limiting results, we work with the scaled 
matrices S n = (l/n)S. Approximate eigenfunctions for S n are found by con- 
sidering a limit K of the matrices S n , and then solving the corresponding 
integral equation 

/ K(x,y)f(y)dy = Xf(x). 
Jo 

Standard matrix perturbation theory is then applied to recover approximate 
eigenfunctions for the original, discrete matrix. 

When we continuize the scaled matrices S n , we get the kernel defined for 
(x,y)E[0,l] x [0,1] 

K(x, y) = l (er lx - y \ - £ e~ |x - y| dx - £ dy + £ e — I*— wl dx dy ^j 

= \{e~\ x - y \ + e~ y + e~ {1 - y) + e~ x + e~ {1 - x) ) + e" 1 - 2. 

Recognizing this as a kernel similar to those in Fredholm equations of the sec- 
ond type suggests that there are trigonometric solutions, as we show in The- 
orem A. 2 in the Appendix. The eigenfunctions we derive are in agreement 
with those arising from the voting data, lending considerable insight into our 
data analysis problem and, more importantly, the horseshoe phenomenon. 
The sequence of explicit diagonalizations and approximations developed in 
the Appendix leads to the main results of this section giving closed form ap- 
proximations for the eigenvectors (Theorem 3.1) and eigenvalues (Theorem 
3.2), the proofs of these are also in the Appendix. 



As a matrix, 
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Theorem 3.1. Consider the centered and scaled proximity matrix de- 
fined by 

SJxuxA = — (e-l^'l/™ + e"*/" + e -( 1 "^) + e -i/n + e -(WA0 + 2c" 1 - 4) 
/or 1 < z,j < n. 

1. 5ef f n ,a{xi) = cos(a(i/n — 1/2)) — (2/a) sin(a/2), where a is a positive 
solution to tan(a/2) = a/(2 + 3a 2 ). T/ien 

S n fn,a( X i) = i T 2 f n > a ( Xi ) + w/lere l%>«l - ~^T~ 1 

2. Sei 

9n,a(xi) — sin(a(z/n — 1/2)), where a is a positive solution to a cot (a/2) 
— 1. T/iera 

S n .9n,a( x i) = T~ — 23n,a(Xi) + Rg.m where \Rg t n\ — ~^ • 

That is, f na and g n>a are approximate eigenf unctions of S n . 

Theorem 3.2. Consider the setting of Theorem 3.1 and let Ai,...,A n 
be the eigenvalues of S n . 

1. For positive solutions to tan(a/2) = a/(2 + 3a 2 ), 



mm 

Ki<n 



A,: 1 



a + 4 



1 + a 2 

2. For positive solutions to a cot (a/2) = — 1, 

a + 2 



mm 

l<j<n 



A, 1 



1 + a 2 



< 



In the Appendix we prove an uncentered version of this theorem (Theorem 
A. 3) that is used in the case of uncentered matrices which we will need for 
the double horseshoe case of the next section. 

In the results above, we transformed distances into dissimilarities via 
the exponential transformation P(i,j) = 1 — exp(— d{xi,Xj)). If we worked 
with the distances directly, so that the dissimilarity matrix is given by 
P(i,j) = \h — lj\, then much of what we develop here stays true. In partic- 
ular, the operators are explicitly diagonalizable with similar eigenfunctions. 
This has been independently studied by physicists in what they call the 
crystal configuration of a one-dimensional Anderson model, with spectral 
decomposition analyzed in Bogomolny, Bohigas and Schmit (2003). 
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Fig. 2. Approximate eigenfunctions /i and $i- 

3.1.1. Horseshoes and twin horseshoes. The 2-dimensional MDS map- 
ping is built out of the first and second eigenfunctions of the centered prox- 
imity matrix. As shown above, we have the following approximate eigen- 
functions: 

• fi( x i) = fn,ai = sin(3.67(z/ra — 1/2)) with eigenvalue Ai « 0.07, 

• fz&i) = f n ,a 2 ( x i) = cos(6.39(z/re — 1/2)) with eigenvalue A2 ~ 0.02, 

where the eigenvalues are for the scaled matrix. Figure 2 shows a graph of 
these eigenfunctions. Moreover, Figure 3 shows the horseshoe that results 
from plotting A:xi^> (v^/i(^i); V^^O^i))- From A it is possible to de- 
duce the relative order of the Representatives in the interval /. Since —f\ is 
also an eigenfunction, it is not in general possible to determine the absolute 
order knowing only that A comes from the eigenfunctions. However, as can 
be seen in Figure 3, the relationship between the two eigenfunctions is a 
curve for which we have the parametrization given above, but which cannot 
be written in functional form, in particular, the second eigenvector is not a 
quadratic function of the first as is sometimes claimed. 

With the voting data, we see not one, but two horseshoes. To see how this 
can happen, consider the two population state space X = {x±, . . . , x n , yi, . . . , 
y n } with proximity d(xi,Xj) = 1 - e~^ % / n ~^ n \ d(yi,yj) = 1 — e~\ % l n ~H n \ and 
d(xi,yj) = 1- This leads to the partitioned proximity matrix 



\Pn 


1 " 


1 
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where P n (i,j) = 1 - e^^'H 
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Corollary 3.1. From Theorem A. 3 we have the following approximate 
eigenf unctions and eigenvalues for —(l/2n)P2 n : 

• fi(i) = cos(a 1 (i/n-l/2)), forl<i<n fi(j) =- cos(cti((J- n)/n- 1/2)) 
/or (n + 1) < j < 2n, where a\ ks 1.3 and X\ ps 0.37. 

• / 2 (i) = sin(a 2 (i/n - 1/2)), /or 1 < i < n f 2 (j) = for (n + 1) < j < 2n, 
where a 2 ~ 3.67 and X 2 ~ 0.069. 

• h(i) =0,forl<i<n, f 3 (j) = sin(a 2 ((j - n)/n - 1/2)) for (n + 1) < j < 
2n, where a 2 3.67 and A3 « 0.069. 



Proof. 



1 P 



^4n 


" 





A n _ 



-111' 

2n 



where ^4 n (z,j) = (l/2n)e '*/ n J */ n l If u is an eigenvector of A n , then the 
vector (u, — tt) of length 2n is an eigenvector of — ^P 2n since 



^4n 


' 





A n _ 



-111 

2n 



11 

-u 



u 

—u 



+ 0. 



If we additionally have that l T u = 0, then, similarly, (u,0) and (0, u) are 
also eigenfunctions of —-^Pin- D 




-0.25 -0.2 -0.15 -0.1 -0.05 0.05 0.1 0.15 0.2 0.25 

Fig. 3. A horseshoe that results from plotting A:xi>—> (a/AT/i (a^), v^J^Zi))- 
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Since the functions f%, fi and f% of Corollary 3.1 are all orthogonal to con- 
stant functions, by the discussion in Section 1.2 they are also approximate 
eigenfunctions for the centered, scaled matrix (—l/2n)HP2 n H . These func- 
tions are graphed in Figure 4, and the twin horseshoes that result from the 
3-dimensional mapping A:z w (V^ifi(z), V^-WM^), y/^fsiz)) are shown 
in Figure 5. The first eigenvector provides the separation into two groups, 
this is a well known method for separating clusters known today as spectral 
clustering [Shi and Malik (2000)]. For a nice survey and consistency results 
see von Luxburg, Belkin and Bousquet (2008). 



Remark. The matrices A n and Pm above are centrosymmetric [Weaver 
(1985)], that is, symmetrical around the center of the matrix. Formally, if 
K is the matrix with l's in the counter (or secondary) diagonal, 



K 



/0 


1 
\1 



1\ 

1 


0/ 



then a matrix B is centrosymmetric iff BK = KB. A very useful review 
by Weaver (1985) quotes I. J. Good (1970) on the connection between cen- 
trosymmetric matrices and kernels of integral equations: "Toeplitz matrices 
( which are examples of matrices which are both symmetric and centrosym- 
metric) arise as discrete approximations to kernels k(x,t) of integral equa- 
tions when these kernels are functions of \x — t\." (Today we would call 




Fig. 4. Approximate eigenfunctions fi, ft and fs for the centered proximity matrix 
arising from the two population model. 
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Fig. 5. Twin horseshoes in the two population model that result from plotting 
A : z ( v/AT/i (z) , («) , v^/ 3 (z)) . 

i/iese isotropic kernels.) "Similarly if a kernel is an even function of its vec- 
tor argument (x, t), that is, if k(x,t) = k(—x,—t), then it can be discretely 
approximated by a centro symmetric matrix. " 

Centrosymmetric matrices have very neat eigenvector formulas 
[Cantoni and Butler (1976)]. In particular, if the order of the matrix, n, 
is even, then the first eigenvector is skew symmetric and thus of the form 
(u\,—ui) and orthogonal to the constant vector. This explains the miracle 
that seems to occur in the simplification of the eigenvectors in the above 
formulae. 

4. Connecting the model to the data. When we apply MDS to the voting 
data, the first three eigenvalues are as follows: 

• 0.13192, 

• 0.00764, 

• 0.00634. 

Observe that as our two population model suggests, the second and third 
eigenvalues are about equal and significantly smaller than the first. 

Figure 6 shows the first, second and third eigenfunctions f\, fi and f% 
from the voting data. The 3-dimensional MDS plot in Figure 1(a) is the 
graph of A : Xj I— > (vAi/l^i)) y/\2f2(xi), V^s/s^i))- Since legislators are not 
a priori ordered, the eigenfunctions are difficult to interpret. However, our 
model suggests the following ordering: Split the legislators into two groups 



16 



P. DIACONIS, S. GOEL AND S. HOLMES 



G\ and G2 based on the sign of fi(xi); then the norm of fa is larger on 
one group, say, G\, so we sort G\ based on increasing values of fa, and 
similarly, sort G2 via fa. Figure 7 shows the same data as does Figure 6, but 
with this judicious ordering of the legislators. Figure 8 shows the ordered 
eigenfunctions obtained from MDS applied to the 2004 roll call data. The 
results appear to be in agreement with the theoretically derived functions in 
Figure 4. This agreement gives one validation of the modeling assumptions 
in Section 2. 

The theoretical second and third eigenfunctions are part of a two-dimensional 
eigenspace. In the voting data it is reasonable to assume that noise eliminates 
symmetry and collapses the eigenspaces down to one dimension. Nonethe- 
less, we would guess that the second and third eigenfunctions in the voting 
data are in the two-dimensional predicted eigenspace, as is seen to be the 
case in Figures 7 and 8. 

Our analysis in Section 3 suggests that if legislators are in fact isomet- 
rically embedded in the interval I (relative to the roll call distance), then 
their MDS derived rank will be consistent with the order of legislators in the 
interval. This appears to be the case in the data, as seen in Figure 9, which 
shows a graph of d(k, •) for selected legislators l{. For example, as we would 
predict, d(h,-) is an increasing function and d(l n ,-) is decreasing. More- 
over, the data seem to be in rough agreement with the metric assumption of 
our two population model, namely, that the two groups are well separated 
and that the within group distance is given by d(li,lj) = \i/n — j/n\. This 
agreement is another validation of the modeling assumptions in Section 2. 

Our voting model suggests that the MDS ordering of legislators should 
correspond to political ideology. To test this, we compared the MDS re- 



100 200 300 400 




100 300 3O0 400 



100 aoo 300 400 



Fig. 6. The first, second and third eigenfunctions output from MDS applied to the 2005 
U.S. House of Representatives roll call votes. 
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Fig. 7. TVie re-indexed first, second and third eigenfunctions output from MDS applied 
to the 2005 U.S. House of Representatives roll call votes. Colors indicate political parties. 



suits to the assessment of legislators by Americans for Democratic Action 
[Americans for Democratic Action (2005)]. Each year ADA selects 20 votes 
it considers the most important during that session, for example, the Pa- 
triot Act reauthorization. Legislators are assigned a Liberal Quotient: the 
percentage of those 20 votes on which the Representative voted in accor- 



0.1s - 
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-O.I 



IOO 2O0 300 400 
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Fig. 8. The re-indexed first, second and third eigenfunctions output from MDS applied 
to the 2004 U.S. House of Representatives roll call votes. Colors indicate political parties. 
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200 400 200 400 200 400 

Legislators Legislators Legislators 

Fig. 9. The empirical roll call derived distance function d(h,-) for selected legislators 
li = 1,90, 181, 182,290,401. The x-axis orders legislators according to their MDS rank. 

dance with what ADA considered to be the liberal position. For example, a 
representative who voted the liberal position on all 20 votes would receive 
an LQ of 100%. Figure 10 below shows a plot of LQ vs. MDS rank. 

For the most part, the two measures are consistent. However, MDS sepa- 
rates two groups of relatively liberal Republicans. To see why this is the case, 
consider the two legislators Mary Bono (R-CA) with MDS rank 248 and Gil 
Gutknecht (R-MN) with rank 373. Both Representatives received an ADA 
rating of 15%, yet had considerably different voting records. On the 20 ADA 
bills, both Bono and Gutknecht supported the liberal position 3 times — but 
never simultaneously. Consequently, the empirical roll call distance between 
them is relatively large considering that they are both Republicans. Since 
MDS attempts to preserve local distances, Bono and Gutknecht are conse- 
quently separated by the algorithm. In this case, distance is directly related 
to the propensity of legislators to vote the same on any given bill. Figure 
10 results because this notion of proximity, although related, does not cor- 
respond directly to political ideology. The MDS and ADA rankings comple- 
ment one another in the sense that together they facilitate identification of 
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Fig. 10. Comparison of the MDS derived rank for Representatives with the Liberal Quo- 
tient as defined by Americans for Democratic Action. 

two distinct, yet relatively liberal groups of Republicans. That is, although 
these two groups are relatively liberal, they do not share the same political 
positions. 

Like ADA, the National Journal ranks Representatives each year based on 
their voting record. In 2005, The Journal chose 41 votes on economic issues, 
42 on social issues and 24 dealing with foreign policy. Based on these 107 
votes, legislators were assigned a rating between and 100 — lower numbers 
indicate a more liberal political ideology. Figure 11 is a plot of the National 
Journal vs. MDS rankings, and shows results similar to the ADA comparison. 
As in the ADA case, we see that relatively liberal Republicans receive quite 
different MDS ranks. Interestingly, this phenomenon does not appear for 
Democrats under either the ADA or the National Journal ranking system. 

Summary. Our work began with an empirical finding: multidimensional 
scaling applied to voting data from the US house of representatives shows 
a clean double horseshoe pattern (Figure 1). These patterns happen often 
enough in data reduction techniques that it is natural to seek a theoretical 
understanding. Our main results give a limiting closed form explanation for 
data matrices that are double-centered versions of 

P(i,j) = 1 - e -6\iln-j/n\^ l<i,j<n. 

We further show how voting data arising from a cut-point model developed 
in Section 3 gives rise to a model of this form. 
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Fig. 11. Comparison of the eigendecomposition derived rank for Representatives with the 
National Journal's liberal score. 



In a followup to this paper, de Leeuw (2007) has shown that some of 
our results can be derived directly without passing to a continuous kernel. 
A useful byproduct of his results and conversations with colleagues and 
students is this: the matrix Pij above is totally positive. Standard theory 
shows that the first eigenvector can be taken increasing and the second as 
unimodal. Plotting these eigenvectors versus each other will always result in 
a horseshoe shape. Perhaps this explains the ubiquity of horseshoes. 

APPENDIX: THEOREMS AND PROOFS FOR SECTION 3 

We state first a classical perturbation result that relates two different 
notions of an approximate eigenf unction. A proof is included here to aid the 
reader. For more refined estimates, see Parlett (1980), Chapter 4, page 69. 

Two lemmas provide trigonometric identities that are useful for finding 
the eigenfunctions for the continuous kernel. Theorem A. 2 states specific 
solutions to this integral equation. We then provide a proof for Theorem 
3.1. The version of this theorem for uncentered matrices (Theorem A. 3) 
follows and is used in the two horseshoe case. 

Theorem A.l. Consider annxn symmetric matrix A with eigenvalues 
Ai<---<A n . Iffore>0 



Af-Xf\\ 2 <e 



HORSESHOES 21 

for some f, X with \\fW2 = 1> then A has an eigenvalue Xk such that | A& — A| < 
e. 

If we further assume that 

s = min I Aj — Afc| > e, 

i:\i^X k 

then A has an eigenfunction fk such that Afk = Xkfk and \\f — fk\\2 < £ /( s ~ 
e). 

Proof. First we show that mirij |Aj — A| < e. If mirij |Aj — A| =0, we are 
done; otherwise A — XI is invertible. Then, 

||/|| 2 < \\(A- AT)- 1 !! - 1|(^- A)/[| 2 
KeUA-Xiy'l 

Since the eigenvalues of (A — A/) -1 are l/(Ai — A), ... , l/(A n — A), by sym- 
metry, 

1 



UA-xiy 1 ] 



miiij I Xi — A I 

The result now follows since H/H2 = 1- 

Set Xk = argmin|Ai — A|, and consider an orthonormal basis g±, ■ ■ ■ ,g m of 
the associated eigenspace E\ k . Define fk to be the projection of / onto E\ k : 

fk = (f,gi)9l H 1" (f,9m)9m- 

Then fk is an eigenfunction with eigenvalue A^. Writing / = fk + (/ — fk), 
we have 

{A - XI) f = (A - XI) fk + (A — XI)(f - fk) 
= (Xk-X)fk + (A-XI)(f-f k ). 
Since / — fk £ Ejr , by symmetry, we have 

(fk,A(f - f k )) = (Af k J - f k ) = (X k f k , f~fk)= 0. 
Consequently, (f k , (A - XI)(f - f k )) = and by Pythagoras, 

US ~ \f\\l = (A* - A) 2 ||/ fc || 2 + || (A - A/)(/ - / fe )|||. 
In particular, 

£>P/-A/|| 2 >||(A-AJ)(/-/ fc )|| 2 . 
For Aj 7^ Xk, \Xi — X\ > s — e. The result now follows since for h £ 

\\(A - XI)h\\ 2 >(s- e)\\h\\ 2 . □ 
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Remark A.l. The second statement of the theorem allows nonsim- 
ple eigenvalues, but requires that the eigenvalues corresponding to distinct 
eigenspaces be well separated. 

Remark A. 2. The eigenfunction bound of the theorem is asymptoti- 
cally tight in e as the following example illustrates: Consider the matrix 

~A " 
A + s 

with s > 0. For e < s, define the function 



.4: 



Vl-eVs 2 " 

e/s 

Then ||/||2 = 1 and \\Af — A/H2 =s. The theorem guarantees that there is 
an eigenfunction with eigenvalue A& such that |A — Afc| < e. Since the 
eigenvalues of A are A and A + s, and since s > e, we must have A& = A. Let 
Vfc = {fk ■ Afk = Xkfk} = {cei :c£l}, where e± is the first standard basis 
vector. Then 

min ||/-/fc|| 2 = ||/- (/-ei)ei|| =e/s. 

Jk £ Vk 

The bound of the theorem, e/(s — e), is only slightly larger. 

We establish an integral identity in order to find trigonometric solutions 
to Kf = Xf where K is the continuized kernel of the centered exponential 
proximity matrix. 



Lemma A.l. 







For constants a G I 
1 '~ c| cos[a(2;- 1/2)] dx 
2 cos [a(c- 1/2)] + (e" 



and c £ [0, 1], 



+ e c - 1 )(asin(a/2) - cos(a/2)) 



and 



1 + a 2 



x ~ c \ S m[a(x-l/2)}dx 
2sin[a(c- 1/2)] + (e 



1 + a 2 



)(acos(a/2) + sin(a/2)) 



1 + a 2 



1 + a 2 



Proof. The lemma follows from a straightforward integration. First 
split the integral into two pieces: 



-\x— c| 



cos[a(x — 1/2)] dx 



J\ x ~ c cos[a(x - 1/2)] dx + £ e c ~ x cos[a(x - 1/2)] dx. 
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By integration by parts applied twice, 

ae x ~ c sin(a(x — 1/2)) + e x_c cos(a(x — 1/2)) 



1 + a 2 



J e x ~ c cos[a{x-l/2)]dx 
and 

/ e «cosKx - 1/2)] A, = ^sinM.-l/^-^-cosM.-l/^ 

7 1 + a 2 

Evaluating these expressions at the appropriate limits of integration gives 
the first statement of the lemma. The computation of f$ e~\ x ~ c \ s'm[a(x — 
1/2)] dx is analogous, and so is omitted here. □ 

We now derive eigenfunctions for the continuous kernel. 

Theorem A. 2. For the kernel 

K(x, y) = i(e-l x - y l + e~ y + e"^-^ + e~ x + e^ 1 "^) + e" 1 - 2 
defined on [0, 1] x [0, 1] , the corresponding integral equation 

K(x,y)f{y)dy = \f(x) 

o 

has solutions 

f(x) = sin(a(x- 1/2)), acot(a/2) = -l 

and 

2 a 

f{x) = cos(a(x - 1/2)) sin(a/2), tan(a/2) = „ . 

a 2 + 6a z 

In both cases, A = 1/(1 + a 2 ). 

Proof. First note that both classes of functions in the statement of the 
theorem satisfy Jq 1 f(x)dx = 0. Consequently, the integral simplifies to 

K(x, y)f(y) dy = \ C (e~^ + e~ y + e~^)f(y) dy. 
Jo 

Furthermore, since e~ y + e~^ 1 ~ v ^ is symmetric about 1/2 and sin(a(y — 1/2)) 
is skew-symmetric about 1/2, Lemma A.l shows that 
»i 

K(x,y)sm(a(y - 1/2)) dy 



f e -\*-y\ S in(a{y - 1/2)) dy 
Jo 



1 f 1 
2 



sin[q(c - 1/2)] (e~ c - e c - 1 )(acos(a/2) + sin(a/2)) 
T+a 1 + 2(1 + a 2 ) ' 
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This establishes the first statement of the theorem. We examine the second. 
Since Jq 1 K(x, y) dy = 0, 

[\e^ x ^ + e~ v + e'Q-tydy = (4 - 2c" 1 - e~ x - e^ 1 "^) 
Jo 

and also, by straightforward integration by parts, 

1 rl 

e~ y cos(a{y- 1/2)) dy = / e~ {1 ~ y > cos(a(y - 1/2)) dy 



_ asin(a/2)(l + e" 1 ) cos(a/2)(l - e" 1 ) 
~ T+a 2 + T+a 2 
Using the result of Lemma A.l, we have 



2 

cos(a(y - 1/2)) - - sin(o/2) 
a 



dy 



cos[a(x - 1/2)] (e~ x + e x - 1 )(asin(a/2) - cos(a/2)) 
T+a 2 + 2(1 + a 2 ) 

asin(a/2)(l + e- 1 ) cos(a/2)(l - e^ 1 ) 



+ 



1 + a 2 1 + a 2 

- sin(a/2)(4 - 2c" 1 - e~ x - e"^) 



a 



cos[a(x - 1/2)] 2sin(a/2) <j)(x) 



1 + a 2 a(l + a 2 ) a(l + a 2 )' 



where 



<j)(x) = 2sin(a/2) + a(e~ x + e* -1 )(asin(a/2) - cos(a/2))/2 
+ a 2 sin(a/2)(l + e" 1 ) + ocos(o/2)(l - e" 1 ) 
- (1 + a 2 ) sin(a/2)(4 - 2c" 1 - e~ x - e"^). 
The result follows by grouping the terms of (f)(x) so that we see 
<P(x) = [2 - 4 + 2c" 1 + e~ x + e- (1 " x) ] sin(a/2) 

+ [e-*/2 + e x ~ l /2 + 1 + e" 1 - 4 + 2c" 1 + e~ x + e^ l - x) ]a 2 sin(o/2) 
+ [-e~ x /2 - e x ~ 1 /2 + 1 - e - 1 ]acos(a/2) 
= [-e-' x /2-e x - 1 /2 + l-e- 1 ] 

x [acos(a/2) -2sin(a/2) - 3o 2 sin(a/2)]. □ 

Theorem A. 2 states specific solutions to our integral equation. Now we 
show that in fact these are all the solutions with positive eigenvalues. To 



HORSESHOES 25 



start, observe that for < x,y < 1, e 1 < e x ^ < 1 and e 1 + 1 < e x + 
e -(i-z) ^ 2e -1 / 2 . Consequently, 

-1 < |e _1 + 1 + e" 1 - 2 < if (a;, y) < \ + 2e" 1/2 + e" 1 - 2 < 1 

and so ||oo < 1- I n particular, if A is an eigenvalue of K, then |A| < 1. 
Now suppose / is an eigenfunction of K, that is, 

A/(x) = / [i (e-N-wl + e -* + e-^ 1 -^ + + e -M) + e^ 1 - 2]/(y) dy. 
J o 

Taking the derivative with respect to x, we see that / satisfies 

(A-1) A/'(x) = \ (\-e-\ x -y\H y {x) - e~* + e~^)fiy) dy, 
Jo 

where H y {x) is the Heaviside function, that is, H y (x) = 1 for x > y and 
H y (x) = — 1 for x <y. Taking the derivative again, we get 

(A-2) Xf"(x) = -f{x) + \ (\e~\ x ~y\ + e~ x + e~^)f(y) 

Jo 

Now, substituting back into the integral equation, we see 

Xf(x) = A/"(x) + f{x) + l\\{e-y + e~M) + e" 1 - 2]/(y) dy. 

Jo 

Taking one final derivative with respect to x, and setting g(x) = f'(x), we 
see 

(A-3) g» {x)= ^l g{x) . 

For < A < 1, all the solutions to (A-3) can be written in the form 

5 (x)=Asin(a(x-l/2)) + .Bcos(a(x-l/2)) 

with A = 1/(1 + a 2 ). Consequently, fix) takes the form 

f(x) = Asm(a(x- 1/2)) +J3cob(o(x- 1/2)) + C. 

Note that since Jq 1 K(x, y) dy = 0, the constant function c(x) = 1 is an eigen- 
function of iT with eigenvalue 0. Since K is symmetric, for any eigenfunc- 
tion / with nonzero eigenvalue, / is orthogonal to c in L 2 (dx), that is, 
J fix) dx = 0. In particular, for < A < 1, without loss, we assume 



f(x)=Asm(a(x-l/2)) + B 



2 

cos(a(x- 1/2)) - -sin(a/2) 
a 



We solve for a, A and -B. First assume B ^ 0, and divide / through by 5. 
Then /(1/2) = 1 — (2/a) sin(a/2). Since K(x, •) is symmetric about 1/2 and 
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sin(a(x — 1/2)) is skew-symmetric about 1/2, we have 
1 - (2/o) sin(o/2) 



A/(l/2) 

it«" 

fin , .1 

f{y)dy 



l + a 2 

l (e |y-i/2| 



/ ( e ly-V2| + e -2/ + e -(i-i/)) cos ( a (y - 1/2)) dy 
2 Jo 

+ -sin(a/2)(e- 1 / 2 + e- 1 -2) 
a 

1 ^ e- 1 / 2 (asin(a/2) - cos(a/2)) 



l + a 2 l + a 2 

asin(a/2)(l + e _1 ) cos(a/2)(l - e~ l ] 



l + a 2 l + a 2 

+ - sin(a/2)(e~ 1/2 + e" 1 - 2). 
a 

The last equality follows from Lemma A.l. Equating the sides, a satisfies 

= 2sin(a/2) + e~ 1/2 a(asin(a/2) - cos(a/2)) + a 2 sin(a/2)(l + e" 1 ) 

+ acos(a/2)(l - e" 1 ) + 2(1 + a 2 ) sin(a/2)(e~ 1/2 + e" 1 - 2) 

= (1 - e~ 1/2 - e _1 )(acos(a/2) - 2sin(a/2) - 3a 2 sin(a/2)). 

From this it is immediate that tan(a/2) = a/(2 + 3a 2 ). Now we suppose 
A^O and divide / through by A. Then /'(1/2) = a and from (A-l) 

A/'(l/2) 



l + a 
1 

~2 



1 -j\-\y^\ Hy (l/2)f(y)dy 

1 f 1 e-\v-WH y (l/2)sm(a(y-l/2))dy 

2 Jo 

e~ 1 / 2 a 
r (acos(a/2) +sin(a/2)) + 



l + a 2V v ' ' v ' " l + a 2 

In particular, a cot (a/2) = — 1. 

The solutions of tan(a/2) = a/(2 + 3a 2 ) are approximately 2kn for integers 
k and the solutions of acot(a/2) = —1 are approximately (2k + l)ir. Lemma 
A. 2 makes this precise. Since they do not have any common solutions, A = 
if and only if B ^ 0. This completes the argument that Theorem A. 2 lists 
all the eigenfunctions of K with positive eigenvalues. 
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Lemma A. 2. 1. The positive solutions o/tan(a/2) = o/(2 + 3o 2 ) lie in 
the set 

oo 

|J (2/cvr, 2/cvr + l/3fcvr), 
fc=l 

mi/i exactly one solution per interval. Furthermore, a is a solution if and 
only if —a is a solution. 

2. The positive solutions o/ a cot (a/2) = —1 Zie m i/ie set 

oo 

|J ((2*1 + 1)7T, (2fc + 1)7T + l/(fc7T + tt/2)), 
fc=0 

w;ii/i exactly one solution per interval. Furthermore, a is a solution if and 
only if —a is a solution. 

Proof. Let f{6) = tan(0/2) - 9/(2 + 36> 2 ). Then / is an odd function, 
so a is a solution to f(9) = if and only if —a is a solution. Now, 

f'(9) = Uec\9/2) + 



2 v ' ' ' (30 2 + 2) 2 

and so /(#) is increasing for 9 > \/2/3. Recall the power series expansion of 
tan# for \9\ < tt/2 is 

tan 9 = 9 + 9 3 /3 + 29 5 /15 + 17# 7 /315 + . . . . 

In particular, for < 9 < tt/2, tan# > 9. Consequently, for 9 £ (0, tt/2), 

So / has no roots in (0,7r/2), and is increasing in the domain in which we 
are interested. Furthermore, for k > 1, 

f(2k7r) < < +oo = lim f(9). 

The third and fourth quadrants have no solutions since f(x) < in those 
regions. This shows that the solutions to f(9) = lie in the intervals 

oo 

(J (2/c7r,2/c7r + 7r), 
fe=i 

with exactly one solution per interval. Finally, for k £ Z>i, 
/(2*wr + l/(3fc7r)) > tan(A:vr + l/(6/cvr)) 

= tan(l/(6ife7r)) 



6kir 

1 



6fc7T 

>0, 
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which gives the result. 

To prove the second statement of the lemma, set g(0) = 6 cot(9/2). Then 
g is even, so g(a) = —1 if and only if g(—a) = —1. Since g'(6) = cot(0/2) — 
(6/2) esc 2 (6/2), g(6) is negative and decreasing in third and fourth quadrants 
(assuming 6 > 0) and furthermore, 

3((2/c + lW) = 0>-l>-oo= lim g(6). 

0-»2(Jfe+l)7i- 

The first and second quadrants have no solutions since g(x) > in those 
regions. This shows that the solutions to g(x) = — 1 lie in the intervals 

oo 

|J((2A; + l)7r,(2/c + l)7r + 7r), 
fc=0 

with exactly one solution per interval. Finally, for k £ Z>o, 
g((2k + l)7r + l/(kir + ir/2)) 

= ((2k + 1)tt + l/(kn + vr/2)) cot(fcvr + vr/2 + l/(2fc7r + vr)) 
= ((2k + 1)tt + l/(fc7r + vr/2)) cot(fcvr + vr/2 + l/(2fc7r + vr)) 
= ((2jfe + l)?r + l/(fcvr + vr/2)) cot(vr/2 + l/(2fcvr + vr)) 
= -((2k + l)vr + l/(Jfe7T + vr/2)) tan(l/(2/cvr + vr)) 
<-l, 

which completes the proof. □ 

The exact eigenfunctions for the continuous kernel yield approximate 
eigenfunctions and eigenvalues for the discrete case. Here we give the proof 
of Theorem 3.1. 

Proof of Theorem 3.1. That / and g are approximate eigenfunctions 
for the discrete matrix follows directly from Theorem A. 2. Suppose K is the 
continuous kernel. Then, 

n 

S n fn,a( x i) = ^2S n (xi,Xj)[cos(a(j/n- 1/2)) - (2/a) sin(o/2)] 

3=1 

= f 1 K(x h y)[cos(a(y - 1/2)) - (2/a) sin(o/2)] dy + R f>n 
Jo 

= ~\ I o fn,a(Xi) ~\~ Rf.m 

1 + a z 

where the error term satisfies 

-^-iir( a : i ,y)[cos(a(y-l/2))-(2/a)sm(o/2)] 



M 

\Rfn\ <7T for M> sup 
2n o<x<i 
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by the standard right-hand rule error bound. In particular, we can take 
M = a + 4 independent of j, from which the result for f n>a follows. The case 
of g nt k is analogous. □ 

The version of this theorem for uncentered matrices is as follows: 



Theorem A. 3. For 1 < i,j < n, consider the matrices defined by 



A n (i,j) = li jl/n and S n (i,j) = A n - — 



2n 



2n 



11 J 



1. Set f n a (%i) = cos(a(i/n — 1/2)), where a is a positive solution to atan(a/ 
2) = 1. 
Then 

^-^fn,a{ X i) + R f,n where \Rf, n \<- 



1 + a 2 



2n 



2. Set 

9n,a{%i) — sin(a(i/ n — 1/2)), where a is a positive solution to a cot(a/2) 

-1. 

Then 

s n g (xi) + R q , n where \R f , n \ < — — ■ 

1 + a z in 

That is, f n;a and g n _ a are approximate eigenf unctions of A n and S n . 

The proof of Theorem A. 3 is analogous to Theorem 3.1 by way of Lemma 
A.l and so is omitted here. 



Proof of Theorem 3.2. Let f n ,a = fn,a/\\fn,ah- Then by Theorem 
3.1, 



K-nfn,a(%i) 

and, consequently, 



1 



1 + o- 



< 



a + 4 

2n||/n.o| 



1 + a 



' fn,a{%i) 



< 



a + 4 



2 2y/n\\f n} a\\2 



By Lemma A. 2, a lies in one of the intervals (2/c7r,2/c7r + l/3kn) for k > 1. 
Then 

\fn,a( x n)\ = | cos (a/2) - (2/a) sin(a/2) | 
> cos(l/67r) — 1/tt 
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Consequently, 

||/n,o||2 > \fn,a{ x n)\ > 1/2 

and so the first statement of the result follows from Theorem A.l. The second 
statement is analogous. □ 
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SUPPLEMENTARY MATERIAL 

Supplementary files for "Horseshoes in multidimensional scaling and lo- 
cal kernel methods" (DOI: 10.1214/08-AOAS165SUPP; .tar). This directory 
[Diaconis, Goel and Holmes (2008)] contains both the matlab (mds_analysis.m) 
and R files (mdsanalysis.r) and the original data(voting_record2005.txt, voting 
_record_description.txt, house_members_description.txt, house_members2005. 
txt, house_party2005.txt) as well as the transformed data (reduced_voting_ 
record2005.txt,reduced_house_party2005.txt). 
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